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Subject: Jupiter performance modeling 



1.0 Introduction 

Over the last few months, it has become apparent that the 
performance of the Jupiter CPU is less than had been originally 
estimated. Previous memos have discussed the limiting factors for 
better CPU performance^ and it is understood, to a first 
approximation, what must be done to improve the performance of the 
machine. This memo proposes a strategy for improving performance; 
estimates the results of implementing the strategy, and lists 
costs and resources necessary to carry it out. 
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2.0 Why performance modeling? 



At present, the priorities for the Jupiter project indicate that 
we must ship a machine with a minimum performance of 2.3 times a 
KL10. In addition, if the FRS machine does have a performance 
near the minimum requirement, an enhanced machine must be shipped 
shortly thereafter. Enhancements that are currently planned are 



the APA and the so-called 



model B machine, 

these 



In order to be successful in any of 
things must be understood, as follows: 



endeavors, several 



o What is the current performance of the machine? 

are the models that are used 



How accurate 
performance? 



t 



■,/o 



o yL p^i^AJa CJ^O.a^y^ e<_ 



to 



predict 



COkrf >'S '^JuJocr^LAC^'' - ^^cV/Oc re5pc^^. ; VAAOt^Afuct 



j 






Ci/s ^ kL^ y ° ■ ° page 2 

o What constitutes a representative set of programs against 
which performance can be measured? 

o What limits the performance of the machine? 

o What can be done to increase the performance of the machine? 

o What is the result on system performance if changes are made 
to the current design? 



o What should be implemented in the AP.A and how much does this 
improve the performance? 

o What should be implemented in the model B machine and how much 
does this improve the performance? 

All of these questions must be answered in-order to meet the goals 
of the Jupiter project. To do this, a performance modeling 
project must be undertaken that will address each of these items. 
This kind of modeling will not only answer the questions above, 
but will also reduce the risk to the Jupiter project by minimizing 
the surprises later on. ' n 

3.0 Performance predictions for various job/ mixes 




In the past, the performance of jC sys_ fc.&RK has typically been* 
measured with three kinds of loads, which can be loosely described 
as Fortran, Cobol, and general timesharing. The Fortran mix is 
characteristic of scientific applications and includes floating 
point and integer arithmetic operations. The Cobol mix represents 
the business ^_a«4- — commercial applications and includes byte, 
string, and QLataty_p€r conversion operations. The general 
timesharing mix (also known as logic mix) is more nebulous but 
includes such applications as text editors, compilers, debuggers, 
etc. 

Because our machines are used in all three types of applications, 
it is important to be able to predict the performance of the 
Jupiter system on all three kinds of job mixes. In order to do 
this effectively, the modeling process used to predict the 
performance of the machine must include all three types of job 
mixes and not be limited to one. This slightly increases the 
complexity of the task, but it is both necessary and worthwhile. 

The increasing use of extended addressing in both high-level 
languages and assembly language programs poses additional problems 
in trying to predict the performance of the machine. Experience* 
on the KL10 has shown that pathological programs can show a severe 
performance degradation when moved from unextended to extended 
implementations. Therefore, performance predictions should not 
ignore the effects of extended addressing. 
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4.0 Minimum machine performance 



The ultimate reason for doing performance modeling for Jupiter is 
to attain the system performance goal of 2.3 times a KL10. This 
goal deserves some discussion so that it is clear what it really 
means. 7 

As indicated in section 3.0 above, there are three types of job 
mixes that will be modeled. What, then, does the phrase "2.3 
times a KL10" really mean? The minimum performance number applies 
to the general timesharing (or logic) mix as measured on a Jupiter 
system. Fortran (without the APA) and Cobol job mixes will almost 
certainly run slower than the general timesharing mix. That is 
not to say that the Fortran and Cobol cases will be ignored in 
favor of improving only the general timesharing case. Rather, our 
goal is to improve all cases as much as we can within the other 
constraints of the project. However, if tradeoffs must be made, 
the general timesharing mix will have highest priority. 

Note thatthe performance of the machine is a direct function of 
the machine cycle time. Changes in machine cycle time cause 
linear changes in the performance of the machine. Therefore, the 
determination of the cycle time is as important in predicting the 
machine performance as performance modeling. 

Because there is inherent error in any performance modeling 
methodology (performance models are usually optimistic) , the 
minimum performance prediction goal must be higher' than the goal 
for the performance of the system. That is, the performance 
predictions must actually be higher than 2.3 times a KL10 to 
insure that the real machine will run at that performance. 
Therefore, the performance goal used for modeling the machine will 
be 2.5 times a KL10. 



5.0 Performance improvements through performance modeling 

The specific tasks of this proposal can be broken down into two 
categories. The first category makes the assumption that the 
performance data that exists is substantially correct. Using this 
data, simple microcode and hardware changes can be made to improve 
the performance of the machine. Section <S . 1 below describes these 
changes in detail (also see the bibliography) . These changes 
appear to have low risk to the current design and may produce a 
significant performance increase. Whether these changes alone 
will improve the performance enough to meet the goals is unknown. 
More extensive changes at this point would be risky because the 
accuracy of the data on which additional changes would be based is 
unknown. 

The second category seeks to provide the additional data and the 
analysis tools necessary to understand what the performance of the 
machine really is. The data and tools produced will be used not 
only to direct additional changes to the design, if necessary, but 
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also to accurately predict the performance of the machine before 
it is built. This kind of modeling, which was not done during the 
design of the original machine, is critical if the performance 
goals are to be met in a predictable manner. 

The proposal includes three areas of study: analysis ■_ and 
reduction of existing data, benchmark selection, and additional 
data gathering and analysis. Analysis of the existing data may 
help direct the task of making the simple microcode and hardware 
changes. Benchmark selection and additional data gathering and 
analysis are tightly coupled and are necessary both to accurately 
understand the performance of the machine, and to direct 
additional changes if that becomes necessary. 



6.0 Specific tasks for performance modeling 

This section discusses the specific tasks that should be 
undertaken for the Jupiter performance modeling project. For each 
of the four tasks listed, there are discussions of the goals, 
justification, benefit, cost and strategy for completing the task. 
There is no implied priority in the order in which the tasks are 
listed. In fact, several of these tasks should be undertaken in 
parallel. 



6.1 Initial performance improvements 

Based on preliminary investigations, it is known that there are 
certain classes of instructions that seem to limit the performance 
of the CPU. There are also certain microcode and "simple" 
hardware changes that can be made to increase the performance of 
the CPU with small redesign cost. 



6.1.1 Goals 

Make microcode and "simple" hardware changes to increase the 
performance of the machine, measure, with simulation, the 
resulting performance improvement for each change, and attempt to 
predict the change to the overall CPU performance. 



6.1.2 Justification 

In looking at the possibilities for improving the performance of 
the CPU, this type of change results in the smallest amount of 
hardware redesign. In addition, more extensive changes would be 
risky at this time because there isn't enough accurate data with 
which we can make design decisions. 



Page 5 



5.1.3 Benefit 
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lmulate the changes, implement the design changes in 
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y, machine resources to do simulations, microcode 
ns, etc. will be required. 



5.1.5 Strategy 

The strategy for completing this task may be broken do 
categories, as follows: 



wn into five 



1. 



instructions that seem to 




EA-calc speed-up in the EBQX. Evaluate the possibility of 
adding hardware to the EROX to increase the speed of 
EA-CALC done by the EBOX. Such a change will improve the 
performance of byte, string, and XCT instructions, and 
indirect addressing. 

Byte pointer decode speed-up. Evaluate changes to the 
hardware (probably the micro-machine next-address 
dispatches) to make byte pointer decode faster. Such a 
change will improve the performance of byte and strinq 
instructions. sn 



The following-l terns investigate^u^provements to the EBOX and 

IBOX microcd-d-e..^ algorithms fc«— Wake the instructions faster. 

Some minimal hardware changes may also be required. 



o Other byte instruction speed-ups. 
o BLT/XBLT speed-up. 



o PUSHJ speed-up. 
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o XCT speed-up. 

o String speed*up. 

Design changes resulting from the investigation of the above AJ 
list fall into the following areas: ^\ D f 

o EBOX EA-calc hardware additions. 

o EBOX and IBOX micro-machine dispatch changes. 

o Other minimal EBOX and IBOX hardware changes 

o EBOX microcode algorithm changes. 

o IBOX microcode algorithm changes (mostly the addition of 
new ICMDs) . 



& 




Microcode implementation. Implement the EBOX and IBOX 
microcode changes that resulted from the design process. 

LISP simulation. Implement the hardware changes that resulted 
from the design process in the LISP simulator. Use the 
modified simulator to insure that the CPU continues to 
implement the PDP-10 architecture. Then use the simulator as 
a tool to measure the performance of the machine. 

Performance predictions. When the new performance for each 
instruction has been measured by the LISP simulator, combine 
that data with the best-guess machine cycle time and the 
simulated workload data that we have to predict the 
performance improvement. 

Iterate. If the predicted performance is not 2.5 times a 

KL10, go back to step 1. ^ J^^U* ci^+cr^ /a CLS^jl'cX J 



6.2 Reduction of current OPHIST data 

Instruction histogram data has been obtained from several_ sites 
with the OPHIST program. Reduction of this data is required if 
decisions are to be made on the basis of the data. 



6.2.1 Goals 

Reduce the large amount of raw data that exists such that we know 
correlations within each site and across sites. produce an 
ordered list of "problem" instructions. 
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6.2.2 Justification 



All OPHIST analysis to date has been done by manually correlating 
the data. There is no real confirmation that the order of 
investigation that was given in section 6.1.5 is correct. There 
is also minimal data on what is most important for the APA. 



6.2.3 Benefit 



This task provides a confidence factor that the priorities are 
indeed correct, especially in determining the sensitivity of the 
data that exists. it also produces an ordered list of the 
instructions that are important for both the APA and non-APA 



cases , 



6.2.4 Cost 



The primary cost of this task is the programmer necessary to write 
the data reduction programs. in addition, machine time is 
required to write, debug, and run the programs. 



6.2.5 Strategy 

The strategy for this task may be broken down into the following 
components: 

1. Decide what correlations we need. It seems obvious that we 
need to know the sensitivity of data for each site, and across 
multiple sites. 

2. Produce a list of the "most important" instructions for both 
APA and non-APA cases from the KC-weighted histograms. At 
present, such a breakdown for individual samples exists, plus 

■a_ high-level summary for all samples. Additional breakdowns 
with more detail are required to direct the design changes. 

3. Change the data reduction programs so that it is easy to 
change the machine cycle time and the performance of each 
instruction. 

4. Attempt to define a "measure of goodness" using the OPHIST 
data so that we can predict the relative performance impact on 
the system of a change to the cycle time or the performance of 
a single instruction. Ideally, the result of this item will 
be to produce a single number that characterizes the 
performance of the machine with any workload. Changes in 
system performance as the result of changes to the cycle time 
or instruction performance would then be directly proportional 
to the change in the number. 



Page 8 



6.3 Benchmark selection 

In order to accurately predict the performance of a CPU, 
benchmarks that are representative of actual workloads must be 
available. This includes Fortran, Cobol, and general timesharing 
benchmarks. 



6.3.1 Goals 

Produce a list of representative benchmarks which can be used to 
predict the performance of Fortran, Cobol, and general timesharing 

lob - 1 "- AW- ^ r— ^>SYS^ m /k smzlJji^ 
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At present, there is no way to characterize the performance of the 
three kinds of instruction mixes that we are worried about. up to 
now, performance predictions have been based on hand evaluation of 
OPHIST data. 



6.3.3 Benefit 

This task produces a representative set of benchmarks which can be 
used to measure and predict the performance of the CPU. In 
addition, these benchmarks allow us to evaluate changes to the 
cycle time and instruction performance. 



6.3.4 Cost 

The primary cost in completing this task is the manpower necessary 
to do the selections. There is also additional time involved in 
getting others to agree that the selections are indeed 
representative. 

6.3.5 Strategy 

This task provides benchmarks in three areas, as follows: 

o Opcode histograms, opcode histograms (via OPHIST or other 
program) that are representative of Fortran, Cobol, and logic 
mix programs are needed. We may be able to construct 
composite opcode histograms out of the work done to reduce 
existing OPHIST data.. p f ^ p^ ^ 
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Programs as input to instruction simulators. These programs, 
will be used to measure the impact of conflicts, cache and 
translation buffer hits, etc. See section 6.4.5 for more 
detail. 

Programs as input to the LISP simulator. These programs will 
be used to measure performance, instruction sequence 
interactions, etc. Because of the simulation rates and the 
limitations of the LISP simulator, such programs must run for 
less than 1 CPU second on a KL10, and issue no monitor calls. 




Wo 

There 

"dirty dozen 
timesharing 



lready been done by a number of groups in this area. 

a collection of programs commonly referred to as the 

" that are allegedly representative of the general 

mix. Single and double precision Whetstones exist 



that provide some indication of the Fortran performance. The 
monitor group has a collection of benchmarks that may be helpful. 
Some work was done on Cobol performance for the Dolphin project 
and a composite Cobol program was constructed that was supposed to 
be representative of what typical Cobol programs do. 

If priorities must be assigned, it is most important to select 
representative benchmarks for the general timesharing mix. The 
performance goals are based on that mix and these benchmarks will 
be used to decide whether additional work is necessary to increase 
the performance of the machine. 



5.4 Additional data gathering and investigation' 

Due to the scarcity of performance data, it is critical to the 
success of the project to gather and evaluate additional data. 
This process is important not only for the FRS machine, but also 
for the APA design and the design of the model B machine. 



5.4.1 Goals 



produce additional data and analysis tools that will increase the 
accuracy of our performance predictions. Quantify the effects of 
extended addressing, indirect addressing, IBOX conflict, IBOX 
flush, IBOX prefetch efficiency, translation buffer conflicts, and 
cache hit. i >o 




5.4.2 Justification 



At present, performance estimates are based on OPHIST results 
only. There is general agreement that the OPHIST results 
approximate the characteristics of real workloads, but only 
minimal attempts have been made to confirm this speculation. In 
addition, there is no actual data on the impact of things like 
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IBOX efficiency, conflicts, the translation buffer, etc. This 
part of the modeling process is the most critical in understanding 
the real performance of the system. 



6.4.3 Benefit 

By producing additional data and analysis tools, performance 
predictions will be more accurate. It will also quantify the 
(currently unknown) effects of the IBOX. 



6.4.4 Cost 

Of all the components of the performance modeling proposal, the 
costs associated with this section are the largest. Completion of 
the items listed below will require one or more people and 
significant machine resources. The simulations are not possible 
with the load averages on existing machines. 






6.4.5 Strategy 

The strategy for this task is broken down into multiple areas, as 
follows: 

1. Additional OPHIST data. OPHIRT data must be ga 
additional sites whose typical load is Fortra 
geneX^J timesharing. More sites will increase the 
theS the accuracy of our performance predictio 
especially true if there is a good correlation be 
with similar workload characteristics. An attem 
made to select at least two sites iwhose typical wo 
general timesharing, Fortran, andJCobol. At least 
data is required from each site tof smooth out the 
variations in load. <£OLoh ? 

TRACKS microcode on the KLlfl. TRACKS microcode 
for several reasons, as follows: 

o verification of OPHIST results. In one data gathering 
mode, TRACKS microcode will keep an opcode histogramof 
instruction execution. By running OPHIRT on a machine 
with TRACKS opcode counting enabled, we should get an 
indication of the validity of the OPHIST measurement 
technique. This is particularly important since many 
decisions are being based on the OPHIST data. 

o Exec mode measurements. Parts of the monitor run with the 
PI system off. Since OPHIST uses the interval timer as a 
stimulus, it can't sample those areas of the monitor. The 
exact impact of this is unknown, but there is a general 
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feeling that certain important parts of the monitor (e.g., 
indirect references to the CST) are being masked. 

o PC traces. In another data gathering mode, TRACKS has the 
ability to generate PC traces. These traces could be used 
as input to a program for analysis of instruction 
sequences. 



3. 



5. 



Conflict analysis. At present, there is no data on the 
effects of instruction conflicts. By using CONF20, a program 
written for the Dolphin project, with "typical" programs, data 
can be gathered concerning conflicts. At present, CONF20 will 



>nly\rjjnjin section zero, 
be modified to run in 
multi-section programs. 



and it needs some work. 
a non-zero section 



It should 
and analyze 



Enhancements to the LISP simulator. In order to determine the 
effects of IBOX flushes and the efficiency of the IBOX 
prefetch algorithms, the LISP simulator must be modified to 
keep more data about the programs that it is simulating. 
Counters can be installed that will keep track of IBOX 
flushes, conflicts, guess-wrong, and the average number of 
instructions ahead that the IBOX is fetching. Because of 
simulation ratios, the selection of programs may be difficult. 

Translation buffer and cache hit analysis. By using a program 
similar to SIM20, address traces can be obtained for 
representative programs. These traces can then be analyzed by 
an existing cache simulator to give us an indication of the 
effectiveness of the translation buffer and the cache. If 
SIM2P1 is used to provide the address traces, it must be 
modified to run in a non-zero section and measure 
multi-section programs. 



This item 
up-to-date 
translation 
analysis of 



is particularly important because we have no 
data on the effects of extended addressing on the 
buffer and cache organizations. A thorough 
this topic also requires data on context switch 



time. First-order cache models assume that the cache has 
reached steady-state. if the context switch rate is too high, 
the cache doesn't have a chance to reach steady-state and the 
cache hit predictions will be too high. 
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7.0 Ordering the tasks 




Under the assumption that the existing performance data is 
correct, we can start the design and implementation process for 
the simple^ hardware and microcode changes immediately. in 
parallel with this, it is important to do the data reduction on 
the existing OPHIST data so that we can measure the relative 
effect of the^ design changes. Although it may not be entirely 
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accurate, the OPHIST data is all that we have with which to 
measure performance. in addition, OPHIST data can be used as an 
indication of relative performance changes when a design change is 
made. 

Benchmark selection can be done somewhat asynchronously to the 
design changes and the reduction of the OPHIST data. It would be 
useful to have some representative benchmarks available when the 
simulator and microc ode ^ c hanges are completed so that some 
prel iminary per f ormance' Pp'red icerElo BS* can be made. Benchmarks must 
be available in order to do any significant data gathering, beyond 
what already exists. 

Most of the additional data gathering and analysis is independent 
of the simple design changes but is dependent on benchmark 
selection. Completion of this task is required before any 
accurate performance predictions can be made. In addition, no 
significant hardware changes can be made, beyond those outlined in 
the above sections, until this task is complete. Besides the 
required manpower, this task depends on machine resources that are 
not currently available. 



8.0 Conclusions 



No serious performance modeling was done during the design of the 
original machine. As a result, the performance is less than we 
expected and some redesign is being done. This memo proposes a 
performance modeling project that will aid us in making decisions 
about the design changes, predict the performance of the machine, 
and minimize the risk to the project. 

The importance of this project should not be underestimated if we 
are to meet our performance goals in a predictable way. In 
addition, no significant hardware redesign should take place 
without understanding the effect on system performance of that 
change. 

The data and tools that are produced by the performance model_in 
pro ject will be used not only for r ed_ egJ-gj»- € > f th e~-EBJL rnachinejjbut 
.so as direction for the design of the APA and tKe "model B 
machine. 




The ultimate machine performance is directly related to the cyclN 
time. Determination (and minimization) of the cycle time i: 
critical to an accurate performance prediction and hence, tol 
machine performance. That determination must be given equal) 
priority to performance modeling. 
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APPENDIX A 
BIBLIOGRAPHY 



The following bibliography lists documents relating to PDP-10 
performance. Copies of these documents are available from me. The list 
is in chronological order. 

1. Newman, Michael, "Improved performance String Move", Digital 
interoffice memorandum, June 23, 1977. Analysis of improved 
string performance. 

2. Murphy, Dan, "Page Refill Timing Data", Digital interoffice 
memorandum, December ^, 1977. Analysis of page refill 
operations on the KL10 under KL10 paging. 

3. Hess, Ted, "Possible 2080 performance", Digital interoffice 
memorandum, February 29, 1930. Simplistic predictions o.f the 
2030 performance based on work done on the Dolphin project. 

A . Hess, Ted, "Possible 2080 Floating-point Performance", Digital 
interoffice memorandum, March 5, 19R0. Simplistic predictions 
of the 2080 floating point performance (with and without an 
APA) . 

5. Miller, Arnold, "2080 extended addressing performance", Digital 
interoffice memorandum, May 24, 1982. Thoughts on the impact 
of extended addressing on the performance of a Jupiter system. 
Concentrates on the effects of the translation buffer 
organization and the effects of indirect addressing. 

$. Manley, Dwight, "Jupiter workload analysis", Report on 
performance analysis, September, 1980. Analysis of Fortran and 
Cobol programs to predict the efficiency of the IBOX. 

7. Uhler, Mike, "Jupiter Performance", Digital interoffice 
memorandum, September 6, 1982. Describes the preliminary 
results of the performance analysis done for Jupiter including 
a simple performance model. 

8. Nixon, David, "performance prediction report for Jupiter", 
Digital interoffice memorandum, October 5, 1982. Summary and 



BIBLIOGRAPHY Page A ^ 2 

analysis of the OPHIST data. 

9. Nixon, David, "Performance Prediction Method for Jupiter", 
Digital interoffice memorandum, November 4, 1982. Discussion 
of the methodology used to gather and analyze the OPHIST data. 

10. Uhler, Mike, "Minutes of the 11/24/82 Performance Committee 
Meeting", Digital interoffice memorandum, November 25, 1982. 
Describes the proposed structure of the Jupiter performance 
committee. Lists a hierarchy of proposed solutions and the 
resources necessary to support the investigations. 
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Date: 06 Sep 82 
From: Mike ©hler 
Dept: L.S.E*G. 
DTN: (8-) 231-6448 
Loc/Mall stop: MRQ1-2/E85 
Net mail: UHLER at 10 



Subject: Jupiter performance 



1.0 Introduction 

Over the past few weeks, Judyt Hall and David Nixon have been 
gathering workload data from several "typical" systems in an 
attempt to characterize the performance of the Jupiter. The data 
gathering techniques being used are fully described in a memo by 
David Nixon. In looking at the initial results, I have developed 
a simple model that provides a first*order approximation of the 
performance of the Jupiter. In addition, we have identified what 
ue believe to be a significant performance bottleneck in the EBOX 
speed of certain instructions. This memo describes the model, 
identifies certain classes of instructions that appear to be 
performance bottlenecks, and makes recommendations about possible 
microcode/hardware solutions to these bottlenecks. 



2.0 The performance model 

In looking at the original workload data, I noticed that the set 
of executed instructions seemed to fit into three broad categories 
as follows: 
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1. Instructions whose KC/KL ratio is less than 3. 

2. Instructions whose KC/KL ratio is between 3 and 6. 

3. Instructions whose KC/KL ratio is greater than 6. 

After some simple calculations on the preliminary data, I 
concluded that the percentage of executed instructions in each 
class was approximately: 

Class Percentage of instructions in class 



1 {< 3 times KL) 25% 

2 (3 - 6 times KL) 25% 

3 (> 6 times KL) 50% 

100% 

This means that half the executed instructions run at least 6 
times faster on a Jupiter than the same instructions on a KL10. 
Intuition would lead one to believe that the performance of the 
Jupiter would be outstanding- But let's calculate the predicted 
performance of the machine using this model and choosing one 
"average" number for each class. 

Class KC/KL ratio Percentage Weighted time 



1 1.5 25% 0.167 

2 3.0 25% 0.083 

3 8.0 50% 0.063 

0.313 

The "weighted time" column was computed by multiplying the 
percentage for each class by the inverse of the KC/KL ratio, e.g., 

0.25M1/1.5) = 0.167 

This number gives the time, in KL units, that the class of 
instructions would take to execute on the Jupiter. The sum of the 
column gives the total time, again In KL units, that all 
instructions would take on a Jupiter. The inverse of this number 
gives the predicted performance of the Jupiter* In this case, the 
predicted performance ratio is 3.2. 

If the model says that half the instructions run at 8 times a 
KL10, why is the overall predicted performance only 3.2? Let's 
look at the weighted time column for the answer. The class 3 
instructions, which amount to 50% of the executed instructions 
account for only 20% of the execution time (0.063/0.313). The 
class 1 instructions/ on the other hand, which amount to only 25% 
of the executed instructions, account for over 50% of the 
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execution time. This non-intuitive behavior leans that the 
instructions that are relatively slow on the- Jupiter sake up a 
large part of the total execution tine even if they are a 
relatively snail percentage of the total instructions executed* 

Letfs see what happens if we adjust the KC/KL ratio for one class. 
First, assume that the class 3 instructions actually run at 4 
times a KL10 (as would be the case if the EBOX were constantly 
waiting for the IBOX to finish setting up an instruction). 

Class KC/KL ratio Percentage Weighted tine 

1 1.5 25* 0.167 

2 3.0 25% 0.083 

3 4.0 50% 0.125 

0.375 

Predicted; performance: 2.7 

k 50% change in the performance of the class 3 instructions only 
makes a 16% change in the performance of the overall machine. 

Let's see uhat happens if we change the speed of the class 1 
instructions instead by assuming that they are only 1.2 times a 
KL10. 

Class KC/KL ratio Percentage Weighted time 

1 1.2 25% 0.208 

2 3.0 25% 0.083 

3 8.0 50% 0.063 

0.354 
Predicted performance: 2.8 

k 20% change in the speed of the class 1 instructions made a 12% 
change in the performance of the overall system and the machine is 
nou spending 59% of the EBOX compute time processing these 
instructions. 



2,1 Significance and accuracy of the model 

In the beginning, the development of the model was an attempt to 
predict the performance of the Jupiter using a very simply/ easy 
to change model of the machine. From that beginning, it has 
developed into a tool for understanding why the performance of the 
machine isn't uhat we thought it should be. As the calculations 
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in the previous section demonstrate* one can change the predicted 
performance of the machine by siailar amounts* either by making 
large changes in the perforaance of the fast instructions or by 
aaking small changes in the performance of the slow instructions. 
By using a simple model, it is much easier to understand this 
non-intuitive behavior* 

Since I developed the original model* I have seen more workload 
data that makes me believe that the model is actually optimistic. 
I believe that the typical KC/KL ratios for classes 2 and 3 are 
reasonably accurate at 3 and 8 respectively- If this assumption 
is correct, the "average" KC/KL ratio for class 1 must be 
unrealistically large* even when it is set at 1.2- Further 
analysis is required to determine the correct numbers to be used 
in the model. 

As with all simple models* this one doesn't exactly predict the 
true performance of the machine. It is* however* a first-order 
approximation of the performance characteristics of the Jupiter 
and it does demonstrate that the slow EBOX instructions will be 
the limiting factor in the speed of the overall machine. 



3.0 Characterizing the slow instructions 

In looking at the workload data from various sites that has been 
sorted by KC weight (i.e.* the percentage of EBOX compute time for 
each instruction)* one observes that the relative position of each 
instruction changes for each site. However* the same instructions 
always seem to appear somewhere near the top of the list. These 
instruction classes are listed below. The table is given in 
alphabetical order and does not reflect the actual order of 
frequency. 

1. BLT 

2. Byte (LDB* IDPB* etc.) 

3. Floating point (both single and double) 

4. POSHJ/POPJ 

5. String (MOVSLJ, CVTxxx* etc.) 

6. XCT 

The data that we have indicates that these six instruction classes 
account for 30 to 80 percent of the total EBOX compute time. As 
such* changes in performance of these instructions could have a 
significant Impact on the overall perforaance of the machine. I 
believe that we should be concentrating on optimizing the 
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performance of these Instructions* 

4.0 Possible Microcode/hardware optimizations 

I have done a cursory investigation of each of these classes of 
instructions and I believe that certain changes are possible that 
could significantly increase the performance of certain classes. 
This section is broken down into subsections, one for each class 
of instruction. Each subsection describes the results of the 
investigation and gives recommendations for each cl ass. 

4.1 BLT 

BLT appears to spend a significant amount of time loading the 
read/write address into EA buffer. Changing the EBOX and IBOX 
microcodes to use new functions uhich allow more overlap in the 
read/write of words appears to make a significant difference. 
There may also be some potential in using two-word reads to get 
source data. 

Estimated performance improvement: 2.0-3.0. 

4.2 Byte (LDB, IDPB, etc.) 

A quick count of nicfocycles seems to indicate that byte 
instructions spend their time doing the following: 

1. Byte pointer typing, validation - 40% 

2. Byte pointer eacalc .. - 30% 

3. Byte manipulation - 30% 

The first two items have the most potential for improvement. 
Adding new dispatches may improve the ability to determine the 
byte pointer type quickly. Improvement in the eacalc time (see 
XCT belou) could improve the byte pointer eacalc time. Additional 
hardware to decode the byte pointer and perform the byte 
manipulation would be required to make a drastic change in the 
performance of these instructions- 
Estimated performance improvement: 1.1-2.0. 
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4*3 Floating point (both single and double) 



Because of the lack of a 72 bit data path in the EBQX, there isn't 
auch that can be done in Microcode to laprove ."".these instructions, 
the addition of the FPA should lap rove the performance of these 
instructions significantly. 

Estimated performance improvement (with FPA): 2.0-5.0 



4.4 PUSHJ/POPJ 

PUSHJ spends aost of its tiae deteraining aha t to store in the 
stack word and how to update the stack pointer. Some iaprov eaent 
can be gained by adding new dispatches to allot* the Microcode to 
check more conditions in parallel. We nay also gain some 
improvement from a change to the IBOX Microcode. 

Estimated performance improvement: 1.1-1.8. 

The POPJ instruction has no extraneous Microcycles as it is 
presently coded. 1 see no real improvement possible for this 
instruction without hardware changes. 



4.5 String (HOVSLJ, CVTxxx, etc.) 

I know the least about this class of Instructions. From what 1 do 
know about them, it appears that they have significant potential 
for improvement. Special casing certain coaaon operations/ 
avoiding the eacalc on every/ byte (if possible), and careful hand 
optiaization could sake a large difference. He may also be able 
to take advantage of any changes that improve the performance of 
the byte instructions. 

Estimated performance improvenent: Unknown 



4.6 XCT 

Most of the tine spent in the XCT instruction is spent performing 
the eacalc on the executed instruction and fetching Its operands. 
Improving the speed of the eacalc subroutine could significantly 
increase performance of this instruction. Unfortunately, only 
additional hardware will make this possible. Improving the eacalc 
speed will also benefit byte and string instructions and IBOX 
traps to EBOX for indirect instructions. An IBOX that processes 1 
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level indirect doesn't solve the p rob lea for XCT and byte 
instructions; the EBOX eacalc routine must be made faster* 

Estimated perf oraance improvement ■ (with hardware): 1.5-2.0 



4.7 lapact on system perf oraance 

He have not yet used' the estimated performance numbers given above 
to analyze the impact of making each change on overall systea 
performance. This work should be completed in the next week or 
two and that data will give us an ordered list of optimizations to 

make. 



5.0 Summary of findings 

Given this data, one can make certain statements about the 
performance of the machine, both in general and in specific terms. 

The performance of a machine is a function of ALL the instructions 
executed on that machine. Significantly; increasing the 
performance of one class of instructions while ignoring another 
class tends to result in a machine whose perf oraance is bound by 
the class that was ignored. Better overall systea performance is 
achieved* by increasing the performance of all instructions by 
approximately the same amount. 

The primary perf oraance bottleneck* is the E BOX compute time of the 
slow instructions. Typically, EBOX processing of this class, 
which amounts to approximately 25% of the executed instructions, 
takes 60 to 80 percent of the systea* 

Microcode changes can be made to significantly increase the 
performance of the machine by optimizing certain of the critical 
instructions. More analysis must be done to predict the overall 
change to system performance. 

Certain hardware changes can be made to further Increase the 
performance of the machine. These changes should be made with 
careful attention given to the benefit/risk tradeoff. Adding new 
dispatch bits so that the microcode may check several conditions 
in parallel may prove to be the most beneficial change. 

Instructions whose KC/KL ratio is 3 or greater are not worth 
optimizing at this point since the resulting change in performance 
is negligible. 

The efficiency of the IBOX seems to have only second-order effects 
on the overall performance of the system* This has been confirmed 
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with initial IBOX simulation data. It is possible that this could 
change if the performance of the slow instructions is 
significantly improved, although the available data doesn't sees 
to indicate-that this Hill happen. 



6.0 Recommendations 

The only realistic way to solve these problems is with a top-down 
approach. Ve cannot afford to implement solutions and then design 
them. Me must evaluate -all changes from a system viewpoint and 
know in advance what impact those changes are going to have on the 
performance of the system. 

I suggest forming a working group consisting of knowledgeable 
people in the areas of architecture, performance, microcode, and 
hardware design whose charter would be to oversee any changes that 
are made* 

There are indeed problems with the performance of the Jupiter CPU. 
Fortunately, there are also solutions to quite a few of these 
problems and the potential exists to significantly increase the 
performance of the machine. 
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NET: KL2102:: or MRVAXi: 

SUBJ: .2080 extended addressing performance 

1.0 Executive summary 

Much has been said about the performance concerns of Jupiter. 
One of the most important elements of this concern is the 
performance of "extended addressing" on the Jupiter, This is 
so because all of our products either now use extended 
addressing, or plan to use it soon. Furthermore, our 
customer's expectations for an efficient and timely technology 
for handling large data bases is quite high. 

Extended addressing performance concerns derive from two 
distinct problems: the caching of paqe pointers by the mbqx 
and the cost of indirect addressing. 

We've already done a great deal of research into the page cache 
problem. The design of the 2080 MB0X and page refill algorithm 
reflect what we have learned. No doubt more could be done, but 
we don't understand the benefits and weaknesses of the new 
design sufficiently well to undertake more desiqn work. The 
KL-10 pager, however, could benefit by some of this knowledge. 

Indirect addressing performace on the 2080 is quite another 
story. Whereas indirect addressing on the KL-10 operates 
acceptably, the projected performance of indirect addressing on 
the 2080 raises significant concerns. The architecture of 
extended addressing relies heavily on indirection and both the 
monitor and user-level programs reflect this architecture, 

2.0 Page cache 

The KL-10 has a well-publicized deficiency in the caching of 
paging information. The KL-10 can cache up to 512 valid paging 
translations, but entries are replaced f our-at-a-time (block 
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size of four). Therefore the effective caching is reduced 
somewhat by frequent conflicts. This limited cache size 
affords many opportunities for programs to thrash by having two 
of more of the addresses involved in executing an instruction 
conflict. An "address conflict" occurs when two addresses 
occupy the same location in the paging cache Calso known as the 
translation buffer); or in other words caching one of the 
addresses invalidates the entry for one or more of the other 
addresses. Since caching is page-by-page, conflicts can easily 
remain in force over many instructions or many data words. 

In an effort to ameliorate this, the 2080 has a much larger 
paging cache with a different organization (block size of one). 
Also, other measures that minimize the need to clear the cache 
have been provided. However, pathological conditions may still 
arise. In general, these pathological conditions are 
eliminated by using a sufficiently high degree of associativity 
in the cache. However, the 2080 page cache, like the KL-10 
cache, is only one-way associative CA particularly Interesting 
observation is that the data cache on both machines is four-way 
associative whereas the page cache is only one-way 
associative). In light of this, we have recommended that the 
2080 EBOX microcode provide some additional caching of page 
pointers or section pointers to smooth out these pathological 
cases. The recent decision to extend the life of the KL-10 
raises the guestion of whether we can do the same sort of 
caching on the KL-10. 

Before any additional measures are considered, it seems prudent 
to commission a study of the problem to produce either 
analytical or empirical support for new features (An article in 
a recent issue of Computer Magazine reported on the performance 
characteristics of the VAXli/780 paging cache. including 
studies on various degrees of associativity. The conclusion 
seems to be that increasing the cache size has the same benefit 
as increasing the associativity, but the architectural 
differences between the VAX and the 2080 could be significant. 
In the absence of any other data, we can only assume that the 
data applies to the 2080 as well as to the VAX. The article 
did not address "pathological conditions"). 

3.0 Indirect addressing 

The second problem, and one that we have yet to find an 
adeguate compromise for, is the performance of indirect 
references. 

The extended addressing architecture provides two technigues 
for inter-section or global addressina: simple indexing and 
indirection. Indexing allows one to address a contiguous 
region of 256K words of virtual memory, plus or minus 128K from 
the base indicated by the index register. In other words, 
indexing computes a full 30-bit address by adding the value in 
the index register to the sign-extended value found in the 
address field of the instruction. 
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Another Interesting property of global indexing is that the 
index register must contain the base address of the data, and 
the instruction word must be used as the offset. This Is 
contrary to the addressing and programming practiced up until 
now on PDP-iOs. This new way of addressing is brought about 
because the instruction has only eiahteen bits in which to 
express an address, but a register has thirty bits. Therefore, 
in order to address data that may be located anvwhere in the 
extended address space, one must swap the roles of the 
instruction and the index register. 

A problem created by this is that the familiar technique for 
writing loops becomes Invalid. That is, the code sequences 

MOVE AC,BASECACX) 



AOBJN ACX, LOOP 

is no longer appropriate as ACX can no longer contain 

-COUNT,, OFFSET 

Instead, loops reauire the use of two or more registers if 
indexing is the addressing choice. For examples 



HRRZ ACX1,ACX 
ADD ACX1. ,8ASE 
MOVE AC, OC ACX 1) 



AOBJN ACX, LOOP 

Other instruction sequences are possible, but this example is 
representative of the nature of the difference. 

Indirection allows one to address data stylistically, similarly 
to the traditional methods. That is, indexing is always a 
positive (or negative) offset from the unsigned base and a 
single index may be used for offset addressing and loop 
control. 

Therefore one can write: 

MOVE AC,£CEFIW BASECACX)] 



AOBJN ACX, LOOP 



3,1 Discussion 
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Despite the architectural bent in favor of indirection, and the 
additional clarity afforded by the technique, the 2080 was 
apparently not designed with this sort of addressing in mind. 
Indirection will effectively defeat much of the advantaoe of 
the IBOX prefetch and result in a significant increase in 
execution time for any instruction employing it. This is so 
because the IBOX is not equipped to decode indirect references, 
and the EBQX is not sufficiently facile at effective-address 
calculations to offset this deficiency. 

Up until now, all of the extant software projects had assumed 
the use of Indirect addressing. TQPS-20 has already been 
modified to reference its paging data base with indirection, 
and more and more Instances of indirection appear in the 
monitor as time goes by. 

The languages had planned to use indirection to reference large 
data bases. Again, this seemed the right choice because of the 
architectural direction and the ease of substituting 
indirection for the existing simple indexing methods. 

3.2 Impacts 

(All 2080 performance figures are based on "best guesses". The 
variation between these values and the actual machine 
performance may be significant- as much as 3.0%. See section 

3.3 for more information). 

As best we can understand, the use of indirection will be five 
times more expensive in execution time than the use of 
indexing. Or looked at it in comparison with indirection on 
the KL-10, simple instructions that use it will execute in 
about the same time as the same instruction, including 
indirection, on the KL-10 (see chart below). Also, the 
Instructions required to implement indexing run no faster, and 
in some cases slower, on the KL-10 than the indirect addressing 
style on the KL. 

This represents the directly measurable differences. However, 
increased code sizes, within loops or not, will affect cache 
hits, in the data cache, the paging cache and the I80X. These 
effects are second order and hard to predict. 

It may be possible to avoid using indirection in the languages, 
but the cost of larger loops, greater complexity in the 
compilers (more use of volatile registers) and the variance in 
characteristics between the KL-10 and the 2080 are significant 
unknowns. The current plans are to use indirection until and 
unless clear reasons demand a chanoe in Plans. The cost to 
switch to indexing In FORTRAN has been estimated at two 
man-months. 

The monitor code has already been done and is working. 
Replacing the uses of indirection with indexing, while not out 
of the question, will take considerable time and effort. To 
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have to commit resources' to a project that provides no direct 
improvement in the product at the expense of potentially 
marketable improvements, would be unfortunate. And, as with 
the languages, the differences between the KL-10 and the 2080 
may well mean that the code is still less than optimal for one 
of the processors. 

The time to convert the "performance critical" uses of 
indirection to indexing is comparable to that for FORTRAN: two 
months. 

3.3 instruction timing chart 

This chart shows the time (indirection) to execute a sinqle 
move instruction using "global indirection" for addressing. An 
example of this is in section 3.0 above. It also shows the 
time (indexing) to execute a sequence of instructions that 
achieves the same result as indirect addressing, but with 
"global indexing" as the addressing mode. Again, section 3.0 
gives an example of such a code sequence. 



2080 KL-10 

indirection 1100 nsecs 900 nsecs 

indexing 200 nsecs 1300 nsecs 



The values for the 2080 were derived by "counting cycles", 
estimating the cost of IBOX conflicts and averaging the various 
instruction sequences that could be used. Therefore this is 
not the "best case" performance for the 2080. Furthermore, the 
numbers represent only an "educated guess" and could vary as 
much as 30% from the "true values". 

The KL-10 values were derived from instruction timings on 
KL2102 using "the best case" instruction seguence given in 
section 3,0. 

To put this in perspective, let's assume the 2080 is 5X the 
performance of a KL-10. Furthermore, let's assume that. an 
instruction that uses indirect addressing will run at the same 
speed as the same instruction on the_ KL-10. Furthermore, any 
instruction that does not use indirection, will run at 5X the 
same instruction on a KL-10 (this is, admittedly, a 
questionable assumption). If 20% of the instructions in a 
program use indirection, then the effective speed of this 
program will be 2.8X a KL-10, or a 45%_loss in throughput. If 
10% of the instructions use indirection, .then there will be a 
27% loss. Presently, the uses of indirection are rare, but as 
applications take advantage of extended addressing, use of 
indirect addressing will grow, h first-order guess that 20% of 
the instructions in a typical FORTRAN application using array 



data seems appropriate. 
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Cost of indirection 
(ideal power factor is 5) 



% indirect instructions 

3 
20 
15 

.10 
5 



power factor 



% loss to ideal 

54 
45 
37 
27 
17 



3.4 Other "indirection" problems 

Certainly, the above-stated problem with writing loops is the 
most pronounced. However, all cases of addressing are affected 
by the performance of indirection. 

For example, consider a subroutine call. Normally this" would 
look like (all timings are subject to the same caveat as given 
in section 3.3) 



CALL SUB 



(528 nsecs) 



However, if the routine being called may be in another section 
from the call site, we would be inclined to writes 



CALL 8CEFIW SUB3 



(1628 nsecs) 



However, the cost of the indirect address calculation might 
well dictate code of the forms 



XMOVEI SAC, SUB 
CALL OCSAC) 



(968 nsecs) 



or 



MOVE AC, [SUB] 
CALL O(SAC) 



(682 nsecs) 



The latter form incurs a penalty of approximately 154 nsecs 
over the non-indexed and non-indirect form, the middle form a 
penalty of 440 nsecs and the from using indirection of 1100 
nsecs. Therefore, the penalty for indirection may be as high 
as an order-of-magnitude greater than that for indexing. 

The KL-10 will execute the indirect form of the call slightly 
faster than it. will the two instruction, indexed form. 



3.4.1 LINK 
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One of the intended features of the new hWK is to detect and 
remove unnecessary indirect references. That is, when one 
writes a program, or when a program is compiled, the author or 
compiler does not have the Knowledge to predict which data 
references, will be to data in the PC section and which will be 
to data in other sections. Therefore, the comoilers will 
generate code that uses qlobal (viz. indirect) references. 

During the loading of the program, LINK will be able to 
determine whether a global reference is required and will be 
able to convert the indirect references into local references. 

If the compilers are obligated to produce indexed references to 
data, this operation will become much more difficult, and in 
many cases impossible, to achieve. 

4. Recommendation 

4.1 Indirection 

Ideally, the 2080 IBQX should be modified to detect and 
interpret instruction addressing of the form: 

OP AC,9[EFIW BASE(X)] 

That is, addressing using a single level of global indirection, 
possibly with indexing, should be handled completely and 
efficiently by the IBOX. 

Many of the benefits of this chanqe have been described 
a l ready z 

. "ideal'' conformance to the architecture 
. encourage the use of "apparent code" 
. smaller programs 

. allow current software development plans to proceed 
. provide common, efficient code for KL-10 and 2080 
4,2 paging cache 

As stated earlier, more study is needed to know if we have a 
problem at all. However, it seems quite clear that some sort 
of microcode-supported cache in the EBOX is desirable, both for 
the 2080 and the KL-10, and should be included in the FRS 2080 
microcode and be available for the "FCC" KL-10. 



[Recording initiated 3t Wed 28-Jul-82 14! 44? 0611 

LINK FROM HALL? TTY 205 

T P S - 2 C o m m 3 n d p r o e e? s s o r 6 < 7 3 4 ) 
OR UN KCHOM 

DECSYSTEM KC10 DIAGNOSTIC MONITOR 

VERSION 11,0? SV==2*0» TOPS-20 > ' KL10» CPU# = 2123 

WEDNESDAY? JULY 28 r 82 14? 44 U 6 

********* 

KCM0N>RUN DCKFB 

DECSYSTEM KC10 INSTRUCTION TIMING TEST (DCKFB) 
VERSION 0,1? SV=2*0» T0PS-20» KLIO? GPU#=2123 . 
WEDNESDAY? JULY 28 » 82 14 t 44 { 22 
SWITCHES ~ 000000 000000 

THE FOLLOWING TIMES ARE APPROXIMATIONS IN USER MODE 
DUE TO VARIOUS SYSTEM CONFLICTS. 

37 - LiJUO . = 1081 N9EC, 

•- 2146 MSEC, (1 RIGHT SHIFT) 

= 2144 NSEC, (8 RIGHT SHIFT -• 1 LEFT) 

= 2383 NSEC, 

= 439 4 NSEC, 

« 9026 NSEC. 

= 1133 NSEC, 

« 1121 NSEC, 

* 4727 NSEC, 
= 10082 NSEC* 

768 NSEC, 

= 1010 NSEC, 

911 NSEC, (A FLOATING POINT ONE) 

= 1018 NSEC, 

= 1257 NSEC, 

920 NSEC, (A FLOATING POINT ONE) 

= 1610 NSEC, (AN INTERGER ONE) 

* 1581 NSEC, (AN INTERGER ONE) 
776 NSEC, (INCREMENT POSITION) 
840 NSEC, (INCREMENT Y) 

= 9671 NSEC, (POSITIVE) 

= . 9613 NSEC, (NEGATIVE) 

= 1393 NSEC, (7 BITS) 

= 1394 NSEC, (6 BITS) 

= 1263 NSEC, (7 BITS - POS 6) 

= 1258 NSEC, (7 BITS - POS 13) 

* 1258 NSEC, (6 BITS - POS 5) 
=• 1271 NSEC, (6 BITS - POS 11) 
= 1806 NSEC, (7 BITS) 
» 1792 NSEC, (6 BITS) 
= 164.0 NSEC, (7 BITS - POS 6) 
« 1577 NSEC, (7 BITS ■■•• POS 13) 
= 1642 NSEC, (6 BITS - POS 5) 
» 1569 NSEC, (6 BITS - POS 11) 



110 - 


•• DFAD 


110 - 


•• DFAD 


111 - 


- DFSB 


112 - 


•• DFMP 


113 - 


•• DFDV 


114 - 


- DADD 


115 ■ 


- DSUB 


116 - 


■• DMUL 


117 - 


- DDIV 


120 - 


- DMOVE 


121 - 


- DMO VN 


1 2 2 - 


•• FIX 


124 - 


- DMO MEM 


125 • 


• DMO VMM 


126 • 


- FIXR 


1 "5 7 - 


- FLTR 


1 3 2 - 


- FSC 


133 - 


- IBP 


133 • 


- IBP 


133 - 


- ADJBP 


133 - 


- ADJBP 


134 - 


•• ILDB 


134 - 


- ILDB 


135 - 


- LDB 


135 • 


- LDB 


135 - 


- LDB 


135 ■ 


• LDB 


136 - 


- IDPB 


136 • 


- IDPB 


137 - 


- DPB 


137 - 


• DPB 


137 - 


- DPB 


137 - 


• DPB 


140 - 


• FAD 


140 - 


- FAD 


.142 ■• 


-FA DM 


142 - 


- FADB... 


144 ■ 


- FADR 


145 ■ 


- FADR I 



LbOdLs^ 



1646 NSEC, (1 RIGHT SHIFT) § . 

1680 NSEC, (8 RIGHT SHIFT •••• 3 LEFT) ->/*»*/ £"2"" 
1907 MSEC, /(*-*r 9 

1903 NSEC, 
1678 NSEC, 
1610 NSEC, 



7-tZ.Z 



146 


- FADRM 


147 


- FADRB 


150 


- FSB 


X w *"-. 


- FSBM 


153 


- FSBB 


154 


- FSBR 


155 


- FSBR I 


156 


- F3BRM 


■1 KT'7 
J. w / 


- FSBRB 


160 


- FMP 


162 


- FMPM 


163 


~ FMPB 


164 


-- FMF'R 


165 


- FMPRI 


166 


- FMPRH 


Is/ 


- FMPRB 


170 


- FDV 


172 


- FBVM 


173 


- FDVB 


174 


- FDVR 


175 


- FDVR I 


176 


- FDVRH 


177 


- FDVRB 


200 


- HOME 


200 


~ MOVE 


200 


- HOME 


200 


- MOVE 


201 


~ HOVE I 


202 


- MOVEM 


203 


- MOVES 


203 


- MOVES 


204 


- MOV 3 


205 


- M0VS1 


206 


- MOVSM 


207 


- MOVSS 


210 


~ MOVM 


211 


-• MOVNI 


o \ '? 


- HOVNM 


213 


-■ MOVNS 


214 


- HOVH 


214 


-. MOVM 


.d X »ul 


- MOVM. I 


216 


- MOVHM 


217 


- MO VMS 


O '? <\ ■ 


- IMUL 


221 • 


- IMUL I 


222 • 


- IMUL.M 


2* 2 3 • 


- IMULB 


224 • 


-.. MUI 


'•> o c: ; • 

A.. A« Ui 


- mui...:i: 


226 • 


- MULM 


226 • 


- MULB 


230 • 


- ID IV 


231 - 


- IDIVI. 


-"3 "S O „ 

A,. W A.. 


- IDIOM 


233 - 


- IDIVB 


234 - 


- DIM 


■•;> ; 2 b; . 


- D1VI 


23,6 - 


- DIVH 



1940 MSEC, 

1940 NSEC, 

2074 NSEC* 
1944 MSEC, 

1941 MSEC, 
1991 MSEC* 
1920 MSEC* 
1880 NSEC* 
1872 NSEC. 

2440 NSEC, 
26 50 NSEC, 
2647 MSEC, 
2484 NSEC, 
2426 NSEC, 
2677 NSEC, 
2681 NSEC, 

4993 NSEC, 
5197 NSEC, 



5210 


NSEC, 


5019 


NSEC, 


49 70 


NSEC, 


5240 


NSEC, 


5277 


NSEC, 


4 20 


NSEC, 


336 


NSEC, 


734 


NSEC, 


4 1 9 


NSEC , 


281 


NSEC, 


595 


NSEC, 


735 


NSEC, 


736 


NSEC, 


420 


MSEC, 


280 


NSEC, 


737 


NSEC, 


736 


NSEC, 


596 


NSEC , 


453 


NSEC, 


9 1 


NSEC, 


9 1 2 


NSEC, 


492 


NSEC, 


492 


NSEC, 


350 


NSEC, 


80S 


NSEC, 


804 


NSEC, 


S 1~ 6 9 1 


NSEC, 


2143 


NSEC, 


2458 


NSEC, 


2465 


NSEC, 


2197 


NSEC, 


2 1 08 


NSEC,, 


2324 


NSEC, 


2501 


NSEC, 


5219 


NSEC, 


5111 


NSEC, 


5223 


NSEC, 


541.8 


NSEC, 


4870 


NSEC, 


4747 


NSEC, 


2 7 8 7 


NSEC, 



(7 ADD/ SUB 



14 SHIFTS) 



(MOVE FROM MEMORY) 

(MOVE FROM AC) 

(MOVE INDIRECT) 

(MOVE INDEXED) 



(MOVES AC) 
(MOVES MEMORY) 



(NEGATIVE) 
(POSITIVE) 



(9 ADD /SUB 



18 SHIFTS) 



23: 



DIVB 



2797 NSEC 



240 


- ASH 


240 


- ASH 


240 


- ASH 


241 


•■•• ROT. 


241 


- ROT 


242 


--. LSH 


242 


- LSH 


243 


- JFF.0 


243 


- JFF.O. 


240 


- ASHC 


245 


~ ROTC 


246 


•- LSHC 


246 


- LSHC 


250 


- .EXCH 


250 


~ EXCH 


251 


•- BUT . 


251 


-BUT 


251 


- BLT 


251 


- BLT 


252 


~ AOBJP 


253 


- AOBJN 


254 


- JftST 


254 


- JRSTF 


255 


•- JFCL 


256 


* XCT 


260 


- PUSH J 


261 


~ PUSH 


262 


- POP 


264 


- JSR 


265 


-■ JSP 


266 


- J.SA 


267 


- JRA 


270 


- ADD 


271 


- ADDI 




-• AD. OH 


*i. / & 


- ADDB 


274 


~ SUB 


275 


- SUB I 


27.6 


- SUBM 


277 


- SUBB 


300 


- ca:i: 


301 


•- CAR. 


302 


- CAIE 


303 


- CAILE 


3 4 


- CAIA 


305 


- CAIGE 


306 


..- ..CAIN 


307 


- CAIG 


310 


-... CAM 


311. 


- CAML 


312 


- CAME 


313 


- CAMLE 


314 


- CANA 


315 


- CAMGE 


316 


-' CAMN 


317 


- CAMG 


320 


- JUMP 


w «£, .1. 


- JUMPL 



1223 NSEC, 

326S MSEC* 

735 NSEC, 

594 NSEC* 

.698 NSEC, 

560 NSEC, 

665 MSEC. 

905 NSEC, 

2128 NSEC* 

1498 NSEC* 

802 NSEC* 

984 NSEC* 

1013 NSEC, 



736 
1634 
2247 
1640 



NSEC 
NSEC 
N: 



3 !" 



NSEC, 
NSEC, 



2199 NSEC, 

458 NSEC, 

460 NSEC, 

315 NSEC, 

876 NSEC, 

765 NSEC, 

495 NSEC, 



1012 


NSEC, 


1 1 


NSEC, 


761 


NSEC, 


679 


NSEC, 


386 


NSEC, 


640 


NSEC, 


770 


NSEC, 



422 NSEC, 
320 NSEC, 
777 NSEC, 
771 NSEC, 
525 NSEC, 
425 NSEC, 
881 NSEC, 
739 NSEC, 



419 NSEC, 

420 NSEC, 

422 NSEC, 
424 NSEC, 

421 NSEC, 
42.0 NSEC, 

423 NSEC, 
420 NSEC, 

559 NSEC, 

567 MSEC, 

572 NSEC, 

572 NSEC, 

561 NSEC, 

.573 NSEC, 

569 NSEC, 

565 NSEC, 

420 NSEC, 

419 NS;EC- 



( P 

CP 
CN 

CL 
CR 
( I... 
(R 
(1 
(1 
CL 
CL 
C L 
CR 



OS I 

OS I 
EGA 
EFT 
IGH 
EFT 
IGH 
BO) 
B35 
EFT 
EFT 
EFT 
IGH 



TIME 15) 

TIME 15 OVERFLOW) 

TIME 15) 

7 ) 
T 7) 

35) 
T 35) 



15) 
7 ) 

71)' 
T 71) 



CAC WITH AN AC) 

(AC WITH MEMORY) 

(MEMORY TO MEMORY 1 WORD) 

(MEMORY TO MEMORY 2 WORDS") 

CAC TO MEMORY 1 WORD) . 

(AC TO MEMORY 2 WORDS) 



„, ,J ,. } 

Ovi, A- 


-- JUMPE 


wj *.. W 


- JUMPLE 


324 


- JUMP A 


325 


- JU.MPGE 


326 


- JUMPN 


W Ah / 


- JUMPG 


330 


- SKIP 


£-.5 A 


- SKIPL 


332 


-- SKIPE 


333 


••- SKIPLE 


334 


- SKI PA 


335 


•- SKIPGE 


336 


- SKIPN 


•J ~Z "7 


- SKIPG 


340 


- AOJ 


341 


- AOJL 


342 


-- AOJE 


343 


- AOJLE 


344 


- AOJ A 


34 5 ' 


~ AOJOE 


346 


- AQJH 


347 


- AOJG 


350 


- A OS 


351 


~ AQSL 


352 


- AOSE 


353 


- AQSLE 


354 


- AGS A 


3d *1 


- AQSGE 


356 


- AOSN 


357 


- AOSG 


360 


- SO J 


361. 


- 80 J L 


362 


- SOJE 


363 


- SOJLE 


364 


- SO J A 


365 


- SOJOE 


366 


- SOJN 


3 6 7 


•••• SOJG 


370 


- SOS 


37.1 


- 80SL 


372 


- SOSE 


373 


- SOSLE 


374 


•- SQSA 


375 


- SOSGE 


376 


.- S.08N 


377 


- sose 


400 


- SET 2 


401 


■- 3ETZI 


402 


.- -SETZM 


403 


™ SETZB 


404 


- AND 


405 


- AND I 


406 


- ANBM 


407 


- AN OB 


410 


~ AN DC A 


43,1 


- AMDCAI 


412 


- ANDCAH 


413 


- ANDCAB 


414 


-...JSETM 



420 NSEC* 

420 NSEC* 

420 NSEC* 

421 NSEC* 

422 NSEC, 
419 NSEC. 



567 NSEC, 
583 NSEC, 
572 NSEC, 
563 NSEC* 
574 NSEC-'* 
567 .NSEC; 
566 NSEC* 



NSEC* 
NSEC, 



455 


NSEC* 


453 


NSEC* 


455 


weep- 


456 


NSEC* 


455 


NSEC* 


454 


NSEC* 


734 


NSEC * 


748 


NSEC* 


745 


NSEC* 


746 


NSEC * 


745 


NSEC* 


7 4 8 


NSEC* 


753 


NSEC* 


7 4 6 


NSEC* 


5 6 4 


NSEC* 


559 


NSEC* 


Q fy A .* 


NSEC* 


559 


NSEC* 


841 


NSEC* 


559 


NSEC* 


560 


NSEC* 


558 


NSEC* 


752 


NSEC* 


745 


NSEC* 


743 


NSEC* 


746 


NSEC* 


647 


NSEC* 


743 


NSEC* 


747 


NSEC* 


740 


NSEC* 


281 


NSEC* 


280 


NSEC* 


595 


NSEC* 


595 


NSEC* 


423 


NSEC* 


315 


NSEC* 


770 


NSEC* 


771 


NSEC* 


421 


NSEC* 


315 


NSEC* 


770 


NSEC* 


770 


NSEC* 


385 


NSjEC * 



415 • 


- SETMI 


416 • 


- SETMM 


417 • 


- SETMB 


420 • 


-AMD CM 


421 • 


- ANBCMI 


422 • 


- ANDCMM 


423 • 


- ANDCMB 


424 • 


- SETA 


425 - 


- SETA I 


426 • 


- SET AM 


427 - 


- SETAB 


430 • 


- XOR 


431 • 


- XOR I 


432 • 


- XORM 


433 - 


- XORB 


434 - 


- OR 


435 •• 


- OR I 


436 - 


■•• ORM 


437 •■ 


- ORB 


440 - 


- ANDCB 


441 .- 


- ANDCB I 


442 - 


•• ANDCBM 


443 - 


- .ANDCB B 


4 4 4 " 


•■ EQV 


445 - 


- EQVl 


446 - 


- EQVH 


447 - 


- EQVB 


450 - 


•• 8ETCA 


451 - 


- S ETC A I 


452 - 


•• SET C AH 


453 - 


•• .S.ETCAB 


454 - 


- ORCA 


455 - 


- ORCA I 


456 - 


•• QRCAM 


457 - 


•• OR CAB 


460 - 


-. SET CM 


461 ...- 


- S.ETCMI 


462 - 


- SETCMM 


463. - 


- SETCMB 


464 - 


- OR CM 


465 - 


- ORCMI 


466 - 


- ORCMM 


467 • 


- ORCMB 


470 - 


- ORCB 


471 •• 


- ORCB I 


472 - 


- ORCBM 


473 - 


- ORCBB 


474 - 


- SETO 


475 - 


■ SETO I 


476 - 


- SETOM 


477 - 


■ SETOB 


500 - 


■ HU... 


501 - 


• HLLI 


502 - 


-....HLLM 


503 -■ 


• HLL3 


504 - 


• HRL 


505 - 


• HRL I 


506 - 


- . HRLM 


507 - 


• HRLS 



351 


NSEC, 


/3b 


NSEC. 


736 


NSEC* 


423 


NSEC, 


315 


NSEC, 


767 


NSEC. 


771 


NSEC, 


385 


NSEC, 


281 


NSEC, 


733 


NSEC, 


737 


NSEC, 


420 


NSEC, 


316 


NSEC, 


768 


NSEC, 


773 


NSEC, 


417 


NSEC, 


315 


NSEC, 


769 


NSEC, 


769 


NSEC, 


420 


NSEC, 


315 


NSEC, 


772 


NSEC , 


76S 


NSEC, 


420 


NSEC, 


315 


NSEC, 


771 


NSEC, 


768 


NSEC, 


319 


NSEC, 


315 


NSEC, 


633 


NSEC, 


630 


NSEC. 


420 


ri JLL * 


315 


NSEC, 


771 


NSEC, 


768 


NSEC, 


385 


NSEC, 


280 


NSEC, 


736 


NSEC, 


735 


NSEC, 


4 1 9 


NSEC, 


315 


NSEC, 


77.0 


NSEC, 


769 


NSEC, 


41.9 


NSEC, 


316 


NSEC, 


771 


NSEC, 


7 6 6 


NSEC, 


281 


NSEC, 


280 


NSEC, 


596 


NSEC, 



594 NSli 



422 NSEC 



386 


NSEC, 


769 


NSEC, 


733 


NSEC, 


420 


NSEC, 


316 


NSEC, 


840 


NSEC, 


734 


NSEC, 



510 - 


HLLZ 


511 - 


HLLZ I 


512 - 


HLL2M 


513 ■•- 


HLL2S 


514 - 


HRLZ 


515 - 


HRLZI 


3.1. 6 - 


HRLZM 


517 - 


HRLZS 


520 -■ 


HLLO 


■=; o i 


HLLOI 


«; oo ~ 

W* -*.. An. 


HLLOM 


%J <cO 


Hi... LOS 


524 - 


HRLQ 


525 — 


HRLOI 


526 - 


HRLOM 


527 - 


HRLOS 


530 - 


HLLE 


531 - 


HI. LEI 


532 - 


HLLEM 


\3 w w ""*" 


HLLE3 


534 - 


HRLE 


535 - 


HRLEI 


536 - 


HRLEM 


537 -- 


HRL.E8 


540 - 


HRR 


541 - 


HRRI 


542 -• 


HRRM 


543 - 


HRRS 


544 - 


HLR 


545 -•■ 


HLRI 


546 - 


HLRM 


547 - 


HLRS 


550 - 


HRRZ 


1 J w J. **" 


HRRZI 


552 - 


HRRZM 


553 ~ 


HRRZS 


554 - 


HLRZ 


555 - 


HLRZI 


556 - 


HLRZM 


557 - 


HLRZS 


560 - 


HRRO 


561 - 


HRR 01 


562 - 


HRROM 


563 - 


HRROS 


564 - 


HLRO 


565 - 


HLR 01 


566 - 


HLROM 


567 - 


HLROS 


570 - 


HRRE 


571 .-. 


HRREI 


572 ...-".. 


HRREH 


re-jr* _ 

w. * w . 


HRRES 


574 .-. 


HLRE 


575 - 


HLR EI 


576 - 


HLREM 


577 - 


HLRES 


600 - 


TRN 


601 -.. 


.TLN 



384 


NSEC* 


283 


NSEC* 


734 


NSEC* 


736 


NSEC, 


385 


NSEC. 


282 


NSEC, 


735 


NSEC, 


734 


NSEC, 


385 


NSEC* 


281 


NSEC* 


734 


NSEC* 


735 


NSEC* 


385 


NSEC, 


281 


NSEC* 


732 


NSEC* 


736 


NSEC* 



458 NSEC* 

352 NSEC* 

803 NSEC* 

805 NSEC* 

455 NSEC* 



350 


NSEC* 


802 


NSEC* 


804 


NSEC* 


420 


™DCw $ 


315 


NSEC* 


767 


NSEC* 


734 


NSEC* 


422 


NSEC* 


316 


NSEC* 


839 


NSEC* 


735 


NSEC* 


386 


NSEC* 


281 


NSEC* 


734 


NSEC* 


733 


NSEC* 


386 


NSEC* 


281 


NSEC* 


734 


NSEC* 


735 


NSEC* 


389 


NSEC* 


280 


NSEC* 


735 


NSEC* 


736 


NSEC* 


385 


NSEC* 


280 


NSEC* 


739 


NSEC* 


735 


NSEC* 


454 


NSEC* 


350 


NSEC* 


£302 


NSEC* 


808 


NSEC* 


454 


NSEC* 


350 


NSEC* 


802 


NSEC* 


806 


NSEC* 


280 


NSEC* 



281 NSEi 



602 - 


TRNE 


603 - 


. TLNE 


604 - 


TRNA 


605 - 


TLNA 


6 6 -■■ 


TENN 


607 - 


TLNN 


610 - 


TON 


611 - 


TSN 


612 - 


TDNE 


613 - 


TSNE 


614 •- 


in ha 


615 - 


TSNA 


616 - 


TDNN 


617 - 


TSNN 


620 -■ 


TRZ 


621 ™- 


TLZ 


622 - 


TRZE 


623 - 


TLZE 


624 - 


TRZ A 


625 - 


TLZ A 


626 - 


T.ftZN 


627 - 


TL2N 


630 - 


TD2 


631 - 


TSZ 


632 ~- 


TDZE. 


633 -- 


TSZE 


634 - 


TUZA 


635 -• 


TSZ A 


6 3 6 


TBZN 


637 - 


TSZN 


640 - 


T.RC 


641 -■■ 


TLC 


642 - 


TRCE 


643 -■ 


TLCE 


644 - 


i Hi; A 


645 -■ 


TLC A 


646 - 


TRCN 


647 - 


TLCN 


650 - 


TDC 


651 •••■ 


TSC 


652 ■■- 


TDCE 


653 - 


TSCE 


654 - 


TDC A 


655 - 


TSC A 


656 - 


TDCN 


657 - 


TSCN 


660 - 


TRO 


661 -- 


TI...0 


662 -- 


TROE 


663 - 


TLOE 


664 •-• 


TRO A 


665 - 


TLOA 


666 -" 


TRON 


667 •••• 


TLON 


670 -- 


TDD 


671 - 


TSO 


672 - 


TDOE 


673 - 


TSOE. 


674 ••- 


TDOA 



422 NSEC, 
493 NSEC, 

386 NSEC* 

387 NSEC, 

420 NSEC, 

491 NSEC, 

2.83 NSEC, 
281 NSEC.' 
563 NSEC, 
761 NSEC, 

538 NSEC, 
529 NSEC, 
558 NSEC, 
626 NSEC, 

385 NSEC, 

386 NSEC, 

423 NSEC, 

492 NSEC, 

386 NSEC* 

387 NSEC, 

421 NSEC, 
488 NSEC, 

523 NSEC* 

524 NSEC, 
566 NSEC, 
636 NSEC, 



527 


NSEC, 


529 


NSEC, 


560 


NSEC, 


632 


NSEC, 


390 


NSEC, 


385 


NSEC, 


426 


NSEC, 


492 


NSEC, 


387 


NSEC, 


387 


NSEC, 


421 


NSEC, 


490 


NSEC, 


524 


NSEC, 


524 


NSEC, 


565 


NSEC, 


6 3 4 


NSEC, 


530 


NSEC, 


530 


NSEC, 


561 


NSEC, 


631 


NSEC, 


384 


NSEC, 


385 


NSEC, 


422 


NSEC, 


493 


NSEC, 


388 


NSEC, 


390 


NSEC, 


4 1 8 


NSEC, 


489 


NSEC, 



528 NSEC, 
525 NSEC, 
565 NSEC, 
631 NSEC, 
528 NSEC, 



675 - 


- ISO A 


676 ■ 


•• TDON 


677 ■ 


■• TSON 


123 - 


•■ EXTEND 


001 - 


- CMPSL 


001 - 


- CMPSL 


001 - 


• CMPSL 


002 ■ 


- CMPSE 


002 • 


- CMPSE 


002 • 


- CMPSE 


003 - 


- CMPSL E 


003 • 


- CMPSLE 


003 - 


- CMPSLE 


004 • 


-EDIT 


004 - 


- EDIT 


004 - 


- EDIT 


004 - 


- EDIT 


005 - 


- CMPSGE 


005 • 


■ CMPSGE: 


005 - 


- CMPSGE 


006 - 


- CMPSN 


006 - 


- CMPSN 


006 ■ 


- CMPSN 


007 - 


- CMPSG 


0.0.7 - 


- , CMPSG 


007 - 


- CMPSG 


010 - 


- CVTDBO 


010 - 


- CVTDBO 


010 - 


- CVTDBO 


Oil - 


•• CVTDBT 


Oil - 


- CVTDBT 


Oil • 


- CMTDBT 


1 2 - 


- CVTBDO 


012 - 


- CVTBDO 


012 - 


- CVTBDO 


013 - 


- CVTBDT 


013 - 


- CVTBDT 


013 - 


- CVTBDT 


014 .- 


- MOVSO 


014 - 


- MOVSO 


014 - 


- MOVSO 


015 - 


- MQVST 


015 - 


• MOV ST 


0.15 - 


- MOVST 


016 - 


- MOVSLJ 


016 - 


- MOVSLJ 


016 - 


- MOVSLJ 


017 - 


- MO VSR J 


017 - 


■ MQVSRJ 


017 • 


•■ MO VSR J 



532 NSEC, 
560 MSEC, 
627 NSEC, 

4767 NSEC. 

5408 MSEC* 

7510 NSEC, 

14602 NSEC, 

. 5338 NSEC, 

7700 NSEC, 

14830 NSEC, 

5262 NSEC, 

7547 NSEC, 

14726 NSEC, 

101810 NSEC 
6815? NSEC, 
66210 NSEC, 
66165 NSEC, 

5376 NSEC, 

7526 NSEC, 

14785 NSEC, 



5278 NSEC, 
7582 NSEC, 
4877 NSEC, 



5249 NSEC, 

7510 NSEC, 

14706 NSEC, 

5170 NSEC, 

7009 NSEC, 

HSiO NSEC, 

5590 NSEC, 

7764 NSEC, 

14248 NSEC, 

14725 NSEC, 
16781 NSEC, 
24081 NSEC, 

14813 NSEC, 
17452 NSEC, 
25845 NSEC, 

6812 NSEC, 

9300 NSEC, 

17300 NSEC, 

7123 NSEC, 
10425 NSEC, 
19322 NSEC, 

5787 NSEC, 
.7917 NSEC, 
14864 NSEC, 

6521 NSEC, 

,. 8560 NSEC, 
15360 NSEC, 



(OVERHEAD - MOVS! 

(1 BYTE) 
(2 BYTES) 
(5 BYTES) 



CI. BYTE) 

(2 BYTES) 

(5 BYTES) 

(1 BYTE) 

(2 BYTES) 

(5 BYTES) 

(BLANK) 
($,01 DUE US) 
($99999,99 DUE US) 
($99999,99 CREDIT) 

(1 BYTE) 

(2 BYTES) 

(5 BYTES) 

(1 BYTE) 

(2 BYTES) 

(5 BYTES) 

(1 BYTE) 

C2 BYTES) 

(5 BYTES) 

(1 BYTE) 

(2 BYTES) 

<5 BYTES) 

(1 BYTE) 

(2 BYTES) . 

(5 BYTES) 

<1 BYTE) 

<2 BYTES) 

<5 BYTES) 

(1 BYTE) 

(2 BYTE) 

(5 BYTE) 

(1 BYTE) 

(2 BYTES) 

(5 BYTES) 

(1 BYTE) 

(2 BYTES) 

(5 BYTES) 

(1 BYTE) 

(2 BYTES) 

(5 BYTES) 

CI BYTE) 

(2 BYTES) 

(5 BYTES') 



BLT (6 WDS)) 



SID 
!> » 

PDP-10 KLIO SPECIAL. INSTRUCTION TIMING TEST (DLKFB) 

VERSION O.lf SV=0.1» CPU#»2123» MCV«275» MCO=40» H0=36» 60HI 



SWITCHES = OOOOOO 000000 

CLK SOURCE = EXTERN? ■ CLK RATE 



FULL? AC BLK ? CACHES 



O 7 



CLOCK CYCLE 
INDEXING 

INDIRECT 



33 NSEC* 

34. NSEC* 

23? NSEC. 



A J 



LUUO 



1061 NSEC, 



40 



NUUO 



4381 NSEC, 



1.1.0 


- DEAD 


110 


- DEAD . 


111. 


- DFSB 


112 


- DFMP 


113 


- DFDV 


114 


- DADB 


115 


•- DSUB 


116 


- DHUL 


117 


~ BDIV 


120 


- DMOVE 


121 


•- DMGVN 


122 


- FIX 


124 


- DHQVEH 


125 


- DMOVNM 


1 2 6 


- FIXR 


X .ai. / 


- FLTR 


132 


~ FSC 


133. 


- IBP 


133 


- IBP 


133 


- ADJBP 


133 


- ADJBP 


134 


-- ILBB 


134 


- ILDB 


135 


- LDB 


135 


-■ LDB 


135 


- LDB 


135 


- LDB 


136 


- IDPB 


136 


- IDPB 


1 37 


- DPB 


1.3.7 


- DPB 


137 


- DPB 


137 


- DPB 


140 


- FAD 


140 


- FAD 


142 


- FA DM 


142 


- FADB 


1.4.4 


- FADR 


145 


- FADRI 


146 


-- FADRM 


147 


- FADRB 


150 


~ FSB 


X~U$ &• 


- FSBH 


153 


- FSBB 


154 


- FSBR 


H G'ET 


_ C Cf K E> T 



2100 
2100 
2351 
4456 
9625 
1091 
1088 
4783 
10427 



NSEC* 
NSEC* 
NSEC* 
NSEC* 
NSEC * 
NSEC* 
NSEC* 
NSEC* 
NSEC* 



(1 RIGHT SHIFT) 
(8 RIGHT SHIFT 



1 LEFT) 



745 NSEC* 

986 NSEC* 

878 NSEC. 

986 NSEC* 

1230 NSEC* 

878 NSEC* 

1568 NSEC* 



1533 
681 
749 

9308 

9308 
1121 

986 

986 

986 

986 
1532 
1532 
1369 
1300 
1369 
1300 

1601 
1601 
1842 
1842 
1636 
1568 
1878 
1878 

2017 
1874 
1878 
1947 

I 070 WCC«'. 



NSEC* 
NSEC * 
NSEC* 
NSEC* 
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NSEC* 


(2 


BYTES) 




14370 


NSEC ♦ 


<5 


BYTES) 




18829: 


1. NSEC. 


(BLANK) 




94723 


NSEC. 


<*. 


,01 DUE US) 




90861 


NSEC. 


<$ 99999. 99 DUE 


US) 


90861 


NSEC* 


($99999.99 CRE 


HIT) 


4140 


NSEC. 


(1 


BYTE) 




6551 


NSEC. 


(2 


BYTES) 




14370 


NSEC. 


(5 


BYTES) 




4357 


NSEC. 


(1 


BYTE) 




6811 


NSEC ♦ 


(2 


BYTES) 




15169 


NSEC. 


(5 


BYTES) 




4268 


NSEC. 


(1 


BYTE) 




6739 


NSEC. 


( 2 


BYTES) 




14957 


NSEC. 


(5 


BYTES) 




4439 


NSEC. 


(1 


BYTE) 




6151 


NSEC. 


(2 


BYTES) 




11578 


NSEC. 


(5 


BYTES) 




4947 


NSEC. 


(1 


BYTE) 




7211 


NSEC. 


(2 


BYTES) 




14370 


NSEC. 


(5 


BYTES) 




14672 


NSEC. 


(1 


BYTE) 




17280 


NSEC. 


(2 


BYTES) 






NSEC. 


(5 


BYTES) 




14983 


NSEC. 


(1 


BYTE) 




18098 


NSEC. 


( 2 


BYTE) 




28187 


NSEC. 


(5 


BYTE) 




5600 


NSEC. 


<1 


BYTE) 




8289 


NSEC. 


t o 


BYTES) 




17280 


NSEC. 


(5 


BYTES) 




6008 


NSEC. 


(1 


BYTE) 




9254 


NSEC. 


(2 


BYTES) 




19686 


NSEC. 


(5 


BYTES) 




4579 


NSEC. 


(1 


BYTE) 




6817 


NSEC. 


(2 


BYTES) 




14370 


NSEC . 


(5 


BYTES)' 




5.182 


NSEC » 


(1 


BYTE) 




7448 


NSEC. 


(2 


BYTES) 




14983 


NSEC. 


(5 


BYTES) 




1334 


NSEC. 








6967 


NSEC. 








1782 


NSEC. 








63u \j 


NSEC. 








630aj 


NSEC. 








^LOJ^ 


ki &js^ 









~ ITiaTSTEnf' 



700 


- CONQ PAG 


= 9506 


NSEC. 




700 


~ DATAO PAG 


.V. «J W O 4. 


NSEC, 


(LOAD UBR) 


700 


.-. DATAO PAG 


» 1507 


NSEC, 


(LOAD AC BLK) 


700 


~ CQNI 774 


= 7170 


NSEC* 




700, 


- CONQ 774 


« 1787 


NSEC, 




700 


.- DATA I 774 


= 6594 


NSEC* 




700 


- DATAO 774 


= 6388 


NSEC* 





TEST COMPLETED 

D20MQNCMD -• 
D20MQN CMD •••■ XXX 

D20MON CMD - XXX 

D20M0N CMD - 



dfkfb 

PROGRAM NOT FOUND - DFKFB. 

B20M0N CMD ■■■• r 
DISK? DIRECTORY - psJ 



D20M0N CMD - dfkfb 

DFKFB, A10 VER 0.1 02-MAY-75 

PDP-10 KLiO INSTRUCTION TIMING TEST (DFKFB) 

VERSION 0,1? SV»0.1» CPU#=2123r MCV=275t MC0=40» H0=36* 60HZ 

SWITCHES = 000000 000000 

CLK SOURCE = EXTERN? CLK RATE = FULL? AC BLK , CACHE? 12 3 

1 - BASIC CLOCK CYCLE IS 33 NSEC/ 

2 - INDEXING TAKES 34 MSEC. 

3 - INDIRECT TAKES 234 NSEC, 

4 -.. INDEXING AND INDIRECT TAKES 267 NSEC, 

5 - MOUEI TAKES 266 NSEC, 

6 - '.HOVE FROM AC TAKES 366 NSEC, 

7 .-.MOVE FROM MEMORY TAKES 400 NSEC, 

8 ~ JHRR FROM MEMORY TAKES 433 NSEC, 

9 - SETOM TAKES 466 NSEC, 

10 - JRST TAKES 300 NSEC, 
11 - JSR TAKES 567 NSEC, 
12- - PUSHJ TAKES 701 NSEC, 

13 - ADD FROM MEMORY TAKES 433 NSEC, 

14 - MUL (9 ADD/SUB •■- 18 SHIFTS) TAKES 2.10 USrBC, 



15 - DIM TAKES 4,65 USEC. 

16 - FIX A FLOATING POINT ONE TAKES 86? NSEC, 

17 ~ FLTR.AN INTERGER ONE TAKES 1.53 USEC, 

18 - FAD <1 RIGHT SHIFT) TAKES 1.57 USEC, 

19 -FAD (8 SHIFT RIGHT - 3 LEFT) TAKES 1*80 USEC, 
2.0 ~ FMP (7 ADD/SUB - 14 SHIFTS) TAKES 2,33 USEC, 

21 - FDV TAKES 4,77 USEC, 

22 -.. DHOVE FROH MEMORY TAKES 733 NSEC, 

23 ~ DFAD (1 RIGHT SHIFT) TAKES 2*03 USEC, 

24 - DFAD <8 SHIFT RIGHT - 1 LEFT) TAKES 2,03 USEC, 

25 - DFMP (7 ADD/SUB - 32 SHIFTS) TAKES 4,20 USEC, 

26 - JBFDV TAKES 8,60 USEC, 

27 -CONG PI TAKES 1,60 USEC, 

28 - CONI PI TAKES 2,80 USEC, 

29. -DATAO APR TAKES 1,30 USEC, 

30. .-..DATAI APR TAKES 1,47 USEC, 

31 ~ HOME TO MEMORY TAKES 566 NSEC, 

32 - LOGICAL SHIFT (35 PLACES LEFT) TAKES 533 NSEC, 

33 ~ LOGICAL SHIFT (35 PLACES RIGHT) TAKES 633 NSEC, 

34 ~ LOGICAL SHIFT COMBINED (71 PLACES LEFT) TAKES 933 NSEC, 

35 -LOGICAL SHIFT COMBINED (71 PLACES RIGHT) TAKES 967 NSEC 

36 - INCREMENT BYTE POINTER TAKES 834 NSEC, 

37 -INCREMENT AND LOAD BYTE TAKES 1,20 USEC, 

38 - INCREMENT AND DEPOSIT BYTE TAKES 1,50 USEC, 

39 - JFCL TAKES 733 NSEC, 

40 r CAI TAKES 400 NSEC, 

41 - JUMP TAKES 400 NSEC, 

42 - CAM TAKES 500 NSEC, 

43 - ,EQV AC TO AC TAKES 400 NSEC, 

44 - EQV MEMORY TO AC TAKES 433 NSEC, 

45 - SETOB TAKES 566 NSEC, 

46 - AOS TO MEMORY TAKES 700 NSEC, 

47 - EXCHANGE AN AC WITH AN AC TAKES 533 NSEC, 

48 - EXCHANGE AN AC WITH MEMORY TAKES 700 NSEC, 

49 - EXECUTE TAKES 533 NSEC, 

50 - BLT MEMORY TO MEMORY TAKES 1,60 USEC, 

51 - BLT AC TO MEMORY TAKES 1,57 USEC, 

52 - DATAI TAKES .10,00 USEC, 

53 - DATAO TAKES 9,00 USEC, 

TEST COMPLETED 
D20M0N CMD - 



